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The  recent" interest  in  biased  estimation 
procedures  in  multiple  linear  regression  arises 
from  the  large  variances  of  the  least  squares  es- 
timators (unbiased)  of  the  regression  coefficients 
when  multicollinearities  are  present.  The  biased 
estimation  procedures  greatly  reduce  this  vari- 
ance at  the  cost  of  some  bias.  It  is  the  purpose 
of  this  paper  to  look  at  this  bias  with  reference 
to  the  nature  of  the  specific  problem  being  in- 
vestigated. 

For  a structure  upon  which  to  base  this  dis- 
cussion let  the  model  be  of  the  form 

’'i  “ B.,  + ...  + B X.  + e. 


One  method  of  Iwndling  this  problam  is  to 
use  a biased  estimator  whlc|>  in  essenga  is  the 
same  as  (3)  except  replace  with  as  the 

coefficient  of  a, (0  < H,  < 1).  Then  the  corre- 

‘ ”32 

spending  term  in  (4)  is  WJ  / X^.  The  idea 

is  to  let  be  small  if  X^  is  ssMll.  Soaw  of 

the  more  connonly  used  biased  estimators  have 
w 

i of  the  following  forms: 


or  Y = 1 Bq  + X 8 + E (1) 

where  Y'”  (Y,,  Y-,  ...»  Y );  1 is  em  nxl  vector 
* 1 4 n • 

of  ones;  B and  B'  “ (B,#  B-.  .../  B 1 are  un- 
o 1 2 p 

known  parameters;  X > {x^^},  an  nxp  matrix  of 
known  values  of  rank  p with  X'l  « ^ and  the  diag- 
onal elements  of  X'X  equal  to  one;  amd  e'  • 

...  cl  with  e.  a random  variable,  E(e,)< 
1 « ^ n 1 1 

0«E(e^)  “ a and  E(e^e^,)  « 0,  i i'. 

There  are  basically  tvo  questions  addressed 
in  a multiple  linear  regression  analysis:  i)  The 
estimation  of  B,  and  ii)  the  estimation  of  Y,  a 
future  value,  when  X^,  X^  ...  X are  given.  The 

answers  are  not  necessarily  the  same  as  will  be- 
come apparent  in  our  discussion.  He  shall  in 
essence  be  looking  for  a solution  to  i)  but  also 
show  that  a poor  solution  to  i)  may  still  be  a 
good  solution  to  li)  in  a restricted  sense. 

1.  A Decon^osltion  of  B 


The  bias  of  this  estimator  of  B would  then 


In  addition  to  selecting  the  H^  so  that  the  vari- 
ances of  the  resulting  estimators  are  not  ex- 
tremely large,  we  wish  to  choose  the  H^  so  that 
the  bias  is  not  great.  Since  a,  is  )cnown  and 


When  problems  with  multicollinearities  arise, 
a tool  often  used  is  to  express  B as  a linear 
function  of  the  latent  vectors  of  X'X  (e.g.  Hock- 
ing et  al.  (19761): 


2 '1=1  '2=2  ^ ^ 'p2p 

where  is  the  latent  vector  of  X'X  associated 
with  the  latent  root  X^.  Placing  this  form  of  B 
into  model  (1)  yields 


I ’1^0*  ^l5l  * 1^252  ^ •••  Vp  ^ S' 

where  St  *=*  mutually  orthogonal 

hence  tlie  y^  • Sfl/sj^s^  are  uncorrelated,  un- 
biased estlMtors  of  y^^  with  variance 

A consequence  of  multicollinearities  Is^soew 
small  X^  (Mason  et  al.  (19751);  hence,  soeie  y^^ 

will  )tave  large  variances.  The  least  squares 
estiiwtor  of  B will  also  liave  soew  elements  with 
large  variances  as  can  be  seen  frcsi 
-1  * * 

b ■ (X'X)  X'Y  - y a,  -*•  y,a  -t  ...  ■*•  y o ; (3) 


giving  Var(b.)  ••  o' 
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Estimator 

SiiBple  Ridge 

Xj/(Xjj-tk)  ll>0it-l,2,...,p 

Generalized  Ridge 

*j/(Xj-tkj)  kj>0;t-l,2 

Principal  Components 

0 or  1;  1-1, 2,..., p 

Shrunken 

c 0<c<l;  t-l,2,...,p 

Phy»ically  this  tree  also  stands  out  froai 
tha  others  with  Xj  » 9 prisury  steas.  We  dis- 
carded this  tree  as  an  outlier  and  continued 
our  analysis  usin9  the  remaining  19  data  points. 
These  had  a coefficient  of  determination  of  R*  • 
.868. 

Table  A. 3 contains  the  standardised  X'X 
matrix.  The  latent  roots  of  this  matrix  are 
“ 3.33,  Ij  » .89,  Aj  ■ .57,  A^  « .12, 

Aj  • .09,  indicating  two  nultlcollinearlties, 

although  they  are  not  as  severe  as  one  often  en- 
counters. The  variance  inflation  factors,  diag- 
onal of  (X'X)"^  - (5.90,  4.58,  6.02,  5.03,  1.37), 
also  point  out  the  existence  of  some  multlcol- 
linearities  and  further  indicate  that  plant  den- 
sity, Xj,  is  not  Involved  in  them. 

The  "total  variance”  of  b,  the  sum  of  the 
variances  of  these  five  estimators,  is 

2 2 

o ‘(the  S<mi  of  the  diagonals  of  (X'X)  1-22. 90a  . 

This  can  also  be  found  from 


From  this  second  form  we  note  that  if  W - W » 
0,  the  resulting  biased  estimator  of  B 
has  total  variance 

I A"^  - 3.46  o^. 

1-1 

Thus  there  is  room  for  a considerable  reduction 
in  variance  using  an  estimator  of  the  form 


* V2  ^ * ’'4V4  ^ 


viaualised  tlirough  a plot  of  r^  /s.  z^.  Figures 

A1  through  A5  show  this  for  the  five  z's  of  this 
example. 

FIGURE  Al.  PARTIAL  RESIDUAL  PLOT 
OP  FIRST  PRINCIPAL  COMPONENT 


With  this  in  mind,  the  question  then  is  how 
much  bias  would  be  Introduced.  The  latent  vec- 
tors and  are  given  in  Table  A. 4.  Note  that 

the  fifth  eleswnt  in  a.  and  a,  are  both  relativ- 
-4  -5 

ely  small;  thus  the  introduction  of  small  and 

will  have  little  effect  upon  8^.  Also  the 

variance  Inflation  factor  of  1.37  indicates  that 
there  is  little  room  for  improvement  on  b^. 


An  Investigation  of  the  Bias 


The  bias  is  affected  by  both  y.  and  y..  We 
have  already  noted  that  y^  - is  an  un- 

biased estimator  of  with  variance  o*/Aj. 

Since  y^  and  y^  are  both  quite  small,  y^  and  y^ 

both  have  exceedingly  large  variances.  What 
this  means  can  be  shown  graphically. 

Due  to  the  orthogonality  of  the  s^,  and 

since  zj^  1-0, 


Y'tj-(Y-l5-yjZj^. 


■^t-l5t-r^t+l!ur'  • '"Vp* 


The  long  expression  in  parentheses  on  the  right 
is  Y adjusted  for  the  intercept  and  all  tha  z's 
except  z,.  Call  this  partial  residual  vector 

r^.  Then  y^  - rj^s^/sj|s^  is  the  least  squares 

slope  of  r^  versus  s^,  a slope  that  is  easily 


FIGURE  A2.  PARTIAL  RESIDUAL  PLOT 
OF  SECOND  PRINCIPAL  COMPONENT 


oa 


FIGURE  A3.  PARTIAL  RESIDUAL  PLOT 
OP  THIRD  PRINCIPAL  COHPONENT 


look  at  Figures  A6  and  A7.  (Figure  A6  is  the 
coetposite  of  Figures  A1  and  AS.  Figure  A7  is 
the  coegnsite  of  the  reflection  of  Figure  A1 
and  Figure  AS) . Both  of  these  are  conpatable 
with  Figure  AS  and  either  one  could  be  the  re- 
sult of  data  for  s,  outside  of  t .2.  The  crux 
of  the  natter  is  that  the  available  data  gives 
us  little  or  no  inforswtion  on  And  y^. 

FIG.  A6.  PARTIAL  RESIDUAL  PLOT  OF  FIFTH  PRINCIPAL 
COMPONENT  AUGMENTED  WITH  POINTS  FROM  FIGURE  A1 


O ^00  - 

O O 

O O 

^400- 


FIGURE  A4.  PARTIAL  RESIDUAL  PLOT 


OF  FOURTH  PRINCIPAL  COMPONENT 


For  a nonint  ignore  the  source  of  this  uts 
and  consider  that  each  of  these  five  plots  was 
brought  to  you  separately  and  you  were  asked  to 
find  a predictor  for  each.  Me  believe  that  for 
Figures  A4  and  AS  you  would  quite  politely  say 
that  there  is  not  enough  infomation  to  establish 
a relationship  between  the  two  variables.  There 
nay  or  nay  not  be  a relationship  but  data  is 
needed  outside  the  range  of  i .2.  For  exanple  . 


*7.  PARTIAL  RESIDUAL  PLOT  OF  FIFTH  PRINCIPAL 
CQMPONIMT  ADGMEOTED  WITH  REFLECTED 
POINTS  FROM  FIGURE  A1 


There  ia,  however,  one  other  available 
source  of  inforaatlon:  the  problem  itself  and 
the  experimenter  knowledgeable  in  the  area.  Be- 
cause of  the  simplicity  of  the  measurements  in 
this  prc^lam  we  can  perhaps  wear  his  hat,  keeping 
in  mind  that  the  variable  for  prediction  is  the 
amount  of  leaves. 

looking  at  the  nature  of  through  X^  we 

see  that  all  four  are  sise  measurements  and  in- 
creasing any  one  should  Indicate  more  leaves. 

Thus  B,  through  B^  should  all  be  positive.  Fur- 
ther, ^the  change  of  scale  using  the  standard- 
ised X's  suggest  that  Bj^  and  Bj  My 

same  magnitude,  as  should  perhaps  B^  and  B^. 

Now  let  us  look  at  and  a^.  Multiplying 

both  sides  of  (3)  on  the  left  by  oj^  yields 

Tt  - 2ifi  ' 

so  that  - .62Bj-.47B2-.52B3-t.33B^'*-.03B5 

Yj  - -.AAB^+.BSBj-.SBBj-t.SSB^+.iyBj. 

Prem  what  this  problem  suggests  about  these  B's 
we  would  not  expect  either  of  these  Y'a  to  be 
large;  i.e.  note  that  the  signs  on  B.  end  Bj  (ah^ 
B-  and  6.)  are  opposite,  yet  we  expect  them  to 
have  the^same  sign  and  roughly  equal  magnitude — 
their  effects  should  therefore  cancel  in  y,  end 
Y-.  Thus  to  estimate  g using  (6)  with  small 
aM  N would  greatly  reduce  the  variances  with 
the  introduction  of  little  bias. 

4.  The  Pitprop  Data 

Jeffers  (1967]  presented  data  on  the  maximum 
compressive  strength  of  timbers  used  in  mines.  A 


description  of  the  variables  is  given  in  Table 
B.l  and  the  standardized  X'X  matrix  in  Table  b.^. 
This  X*X  matrix  has  three  small  latent  voots 


>33  - .0387,  Ij.  « .0415,  = .0506. 

Their  corresponding  latent  vectors  are  given  in 
Table  B.3 

For  this  exaaple  wearing  the  hac  of  the  ex- 
perimenter is  mere  difficult  so  let  us  coucen- 

X^  and  x^ , the  top  diameter  of  the  prop 
and  the  length  of  the  prop.  One  would  expect 
the  greater  the  dias«ter,  the  greater  the 
strength;  hence,  6,  should  be  positive.  With 
the  sasie  line  of  thinking,  the  longer  the  prop 
the  less  its  supporting  strength;  hence  B, 
should  be  negative.  ^ 


Now  look  at 

'*^12 

and  '' 

^13  ‘ 

^12  “ - 

.39 

fil  ^ 

.41 

Bj  ♦ ... 

^13  “ - 

.57 

61* 

.58 

Bj  + ... 

From  the  contribution  of  63  and  Bj  one  would  ex- 
7^2  ^13  I9*  small.  Thus  saall 

values  of  W.^  and  W,3  in  § could  lead  to  con- 
siderable bias  in  tne  estisuitor  of  B. 

5.  The  Effect  of  W^  on  the  Prediction  of  V 

Up  to  this  time  we  have  been  concerned  with 
the  estimation  of  B.  Now  let  us  consider  the 
use  of  this  estimate  to  predict  Y at  a point  X. 
(Note  that  the  X’s  in  this  vector  are  scaled  In 
the  game  maiuier  as  those  in  our  X matrix. ) Writ- 
img  B in  the  form 


B - W^Y^aj  + W2Y222  ^ Vp2p 

the  predictor  will  be 

Y - Y + NjY^X'^*  ‘'p'’^p^'2p’ 

The  bias  in  the  predictor  is 


( I-W3)  Y3X • ( l-Wj) YjX ■ OjT.  . . T ( 1-Wp)  YpX ' 2p  • 

The  bias  will  be  sMll  if  for  each  small  W^ 
either  y^  or  X'o^  is  small.  Let  us  say  that 

was  samll,  indicating  a multicollinearity  with 

coefficients  identified  in  a„.  W is  then  chosen 

-P  P 

to  be  small  in  order  to  reduce  the  variance  of  B 
but,  unfortunately,  suppose  Yp  Is  not  near  zero. 

The  pth  term  in  the  bias  will  still  contribute 

little  if  X'a  is  near  zero,  conteracting  the 
— — p 

nonzero  Yp-  Thus,  if  the  point  of  prediction 

satisfies  this  siulticollinsarity  of  the  original 
data,  a small  W will  induce  little  bias  in  the 
predictor.  ** 

To  summmrize  this  last  paragraph,  consider 
that  the  W^  are  chosen  samll  only  when  X^  is 

small.  Than,  regardlass^of  the  magnitude  of  the 
Yg  (within  reason) , the  Y of  (7)  will  have  little 

bias  if  the  point  of  prediction,  X,  satisfies 
the  multicollinearities  of  the  original  data. 


10 


For  example  a prediction  equation  from  the 
pitprop  data  using  small  could  be 

quite  satisfactory  for  props  with  multicollin- 
earities  designated  by  and  However  in- 

terpretation of  the  in  this  equation  could  be 

misleading.  On  the  other  hand  interpretation  of 
the  6.  from  the  mesquite  data  using  small  W and 
Wj  should  be  informative.  ’ 

6.  Conclusion 

In  biased  linear  regression  techniques, 
small  are  desired  in  order  to  reduce  the  vari- 
ance of  3 in  the  presence  of  multicollinearities. 

this  will  introduce  considerable  bias  in  ^ if  the 
corresponding  are  large.  The  basic  point  of 

this  paper  is  that  when  is  small  the  data  it- 
self generally  gives  little  information  on  Yj. 

This  is  well  Illustrated  by  the  graphs  of  the 
mesquite  data.  Some  information  may  be  available, 
however,  if  is  expressed  as  a linear  function 

of  the  elements  of  B and  the  nature  of  the  spe- 
cific problem  is  carefully  analyzed. 

We  feel  that  in  practice  a complete  in- 
vestigation of  the  properties  of  a set  of 
regression  data,  as  suggested  above,  is  sometimes 
hampered  by  the  lack  of  coBg>uter  programs. 

Robert  Pierce,  while  at  SMU,  has  written  a pro- 
gram, REGRESS,  which  will  simultaneously  do  a 
least  squares,  ridge,  latent  root  and  principal 
component  analysis  as  well  as  furnish  a durunken 
estimate.  The  latent  roots  and  vectors  of  X'X 
are  a portion  of  the  printout.  A copy  of  this 
program  is  avail2tble  upon  request. 


MESQUITE  DATA: 


TABLE  A.l 

RESPONSE  AND  PREDICTOR  VARIABLES 


VarieUale  Description 

Y(LEAFVr)  Total  Weight  (GRAMS)  of  Photosyn- 

thetic Material 


X^(DIAMl) 


Canopy  Diameter  (METERS)  Measured 
Along  the  Longest  Axis  of  the  Tree 
Parallel  to  the  Ground 


X^ (DIAM2) 


Canopy  Diameter  (METERS)  Measured 
Along  the  Shortest  Axis  of  the  Tree 
Parallel  to  the  Ground 


Xj(TOTHT) 


Total  Height  (METERS)  of  the 
Mesquite  Bush 


X.(CANHT) 
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Canopy  Height  (METERS)  of  the 
Mesquite  Bush 


X^  (DENS) 


Plant  Unit  Density  (NUMBER  OF 
PRIMARY  STEMS/PLANT  UNIT) 


TABLE  A.  2 
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MESQUITE  DATA 

Bush 


Number 

DIAMl 

D1AM2 

TOTHT 

CANHT 

DENS 

LEAFWT 

1 

2.50 

2.30 

1.70 

1.40 

5 

723.0 

2 

5.20 

4.00 

3.00 

2.50 

9 

4052.0 

3 

2.00 

1.60 

1.70 

1.40 

1 

345.0 

4 

1.60 

1.60 

1.60 

1.30 

1 

330.9 

5 

1.40 

1.00 

1.40 

1.10 

1 

163.5 

6 

3.20 

1.90 

1.90 

1.50 

3 

1160.0 

7 

1.90 

1.80 

1.10 

.60 

1 

386.6 

8 

2.40 

2.40 

1.60 

1.10 

3 

693.5 

9 

2.50 

1.80 

2.00 

1.30 

7 

674.4 

10 

2.10 

1.50 

1.25 

.85 

1 

217.5 

11 

2.40 

2.20 

2.00 

1.50 

2 

771.3 

12 

2.40 

1.70 

1.30 

1.20 

2 

341.7 

13 

1.90 

1.20 

1.45 

1.15 

2 

125.7 

14 

2.70 

2.50 

2.20 

1.50 

3 

462.5 

15 

1.30 

1.10 

.70 

.70 

1 

64.5 

16 

2.90 

2.70 

1.90 

1.90 

1 

850.6 

17 

2.10 

1.00 

1.80 

1.50 

2 

226.0 

18 

4.10 

3.60 

2.00 

1.50 

2 

1745.1 

19 

2.80 

2.50 

2.20 

1.50 

1 

908.0 

20 

1.27 

1.00 

.92 

.62 

1 

213.5 

11 
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nui  A.3  table  B.l 


CORBZIATIOM  MATRIX  CT  PREDICTOR  AMD 

RESPONSE 

PITPROP  DATA: 

RESPONSE  AND  PREDICTOR  VARIABLES 

1 VARIABLES 

, MBSQOin  DATA  (n  • 19) 

♦ 

• 

1 ^ 

VARIABLE 

DESCRIPTION 

*1 

*2 

*3  *4 

*5 

X,  l.OO 

.88 

.73  .68 

.32 

Y 

MaxlBua  Coaprasaive  Strength  of 

1 

Pr^ 

*5 

1.00 

.64  .58 

.20 

[ 2 

*1 

Top  Diaaeter  of  the  Prop  (INCHES) 

X 

1.00  .88 

.39 

3 

*2 

Length  of  frop  (INCHES) 

X. 

1.00 

.22 

4 

Nolature  Content  of  Prop  («  OP 

1 

1.00 

J 

DRY  HEIGRr) 

■ 

*4 

Specific  (Cavity  of  the  Tlat>er 

(AT  TINE  or  TEST) 

*5 

Ovei>-dry  Specific  Gravity  of  the 

i 

Tlaber 

! 

TABLE 

A.4 

*6 

Nuaber  of  Annual  Rings  at  Top  of 

SHALUST  TWO 

LATENT 

ROOTS  AND  CORRESPONDING 

Prop 

LATENT  VECTORS 

OP  X'X 

, MESQUITE  DATA 

(n  • 19) 

*7 

NMber  of  Annual  Rings  at  Base 

of  Prop 

*5  ■ 

.0904 

- .1155 

! 

*8 

Maxlaw  Bow  (INCHES) 

1 VARIABLE 

a. 

a . 

VIF 

Distance  of  the  Point  of  Maxlaua 

-5 

=4 

*9 

\ 

« 

Bow  froa  Top  of  Prop  (INCHES) 

X, 

.444 

.624 

5.9 

1 ^ 

*10 

Nuaber  of  Knot  Whorls 

' X, 

.420 

-.472 

4.6 

Length  of  Clear  Prop  froa  Top  of 

* 

*11 

1 *3 

.544 

-.522 

6.0 

Prop  (INCHES) 

’ _ 

Li  4 

.551 

.339 

5.0 

*12 

Average  Nuaber  of  Knots  per  Whorl 

1 

.165 

.026 

1.4 

*13 

Average  Diaaeter  of  Knots  (INCHES) 

I 


t' 

f 


TABLE  B.2 


CORRELATION  MATRIX  OF  PREDICTOR  AND  RESPONSE  VARIABLES,  PITPROP  DATA 


*I 

*2  *3 

*4  *5 

*6  *7 

*8  *9 

*10  *11 

*12 

\3 

Xj  1.00 

.95  .36 

.34  -.13 

.31  .50 

.42  .59 

.55  .08 

-.02 

.13 

*2 

1.00  .30 

.28  -.12 

.29  .50 

.42  .65 

.57  .08 

-.04 

.14 

*3 

1.00 

.88  -.15 

.15  -.03 

-.05  .13 

-.08  .16 

.22 

.13 

*4 

1.00  .22 

.38  .17 

-.06  .14 

-.01  .10 

.17 

.02 

1.00 

.36  .30 

.00  -.04 

.04  -.09 

-.15 

-.21 

*6 

1.00  .81 

.09  .21 

.27  -.04 

.02 

-.33 

*7 

1.00 

.37  .47 

.68  -.11 

-.23 

-.42 

>‘8 

1.00  .48 

.56  .06 

-.36 

-.20 

S 

1.00 

.53  .09 

-.13 

-.08 

*10 

1.00  -.32 

-.37 

-.29 

X 

11 

1.00 

.03 

.01 

*12 

1.00 

.18 

*13 

1.00 

Y -.42 

-.34  -.73 

-.54  .25 

.12  .11 

-.25  -.24 

-.10  -.06 

-.12 

-.15 

TABLE  B.3 

SMALLEST  THREE  LATENT  ROOTS  AND 

CORRESPONDING  LATENT  VECTORS  OF  X' 

'X,  PITPROP  DATA 

- .0387 

Ijj  - .0415 

- .0506 

VARIABLE 

VIF 

•13 

-12 

-U 

*1 

-.572 

-.392 

-.005 

13 

*2 

.582 

.411 

-.054 

14 

*3 

.408 

-.527 

.117 

12 

*4 

-.383 

.585 

-.017 

12 

• 

*5 

.118 

-.202 

.005 

3 

*6 

.057 

-.080 

-.537 

7 

*7 

.002 

.036 

.764 

12 

*8 

.018 

.053 

.026 

2 

*9 

-.058 

-.054 

-.051 

2 „ 

*10 

.004 

-.060 

-.318 

5 

l|M  M 



f 

i 

*11 

-.007 

-.005 

-.048 

2 hw 

X 

11 

a i 

.004 

-.002 

.047 

1 

12 

1 

a 

lUuityTiiw .'jvisit ' in  c»n 
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